1 July 2025
Aside - Extreme value modelling
- Consider the problem of producing high-resolution risk maps of some climate variable for a very large region

Annual maxima
- Our favourite (easiest!) extremes to model are annual maxima

Extreme value distribution fits

100-year return level

Aims
- For ADD-TREES we need hi-res climate data to drive JULES
- we’re aiming for parcel level
- no daily climate data are available at such high res, so we need to downscale some lower res data
mesoclim will downscale UKCP18’s 12km data to parcel level
- this downscaled data can drive JULES
- But, UKCP18’s 12km data is only available for RCP8.5, which gives a rather incomplete picture of future climate
Method - Temperatures I
- Consider a single UKCP18 ensemble member
- Let \(Y_{t, i}\) denote the temperature on day \(t\) for grid cell \(i\)
- The collection of temperatures for a day is \(\mathbf{Y}_{t} = (Y_{t, 1}, \ldots, Y_{t, n})^{\textsf{T}}\), and for UKCP18 we have \(n = 9184\) in its original form, which covers the UK and Ireland
- A relatively flexible statistical model for \(\mathbf{Y}_{t}\) is a multivariate Gaussian distribution, i.e. \[
\mathbf{Y}_{t} \sim MVN_n(\boldsymbol{\mu}, \boldsymbol{\Sigma})
\] and we can easily estimate \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) as \[
\hat{\boldsymbol{\mu}} = \dfrac{1}{T} \sum_{t = 1}^T \mathbf{y}_t,~~
\hat{\boldsymbol{\Sigma}} = \dfrac{1}{T - 1} \sum_{t = 1}^T (\mathbf{y}_t - \hat{\boldsymbol{\mu}}) (\mathbf{y}_t - \hat{\boldsymbol{\mu}})^{\textsf{T}}
\]
UKCP18 and simulated temperatures

Method - Temperatures II
- For temperatures, it’s important to capture day-to-day variability
- assuming a day’s temperatures are independent of yesterday’s isn’t realistic
- We can extend the multivariate Gaussian model to be \(2n\)-dimensional \[
\begin{bmatrix}
\mathbf{Y}_{t} \\
\mathbf{Y}_{t+1}
\end{bmatrix}
\sim MVN_n\left(
\begin{bmatrix}
\boldsymbol{\mu} \\
\boldsymbol{\mu}
\end{bmatrix},\,
\begin{bmatrix}
\boldsymbol{\Sigma} & \boldsymbol{\Psi}\\
\boldsymbol{\Psi}^{\textsf{T}} & \boldsymbol{\Sigma}
\end{bmatrix}
\right)
\] and can easily estimate \(\boldsymbol{\Psi}\), which we can use to simulate \[
\mathbf{Y}_{t+1} \mid \mathbf{Y}_{t} = \mathbf{z} \sim MVN_n\left( \tilde{\boldsymbol{\mu}}(\mathbf{z}), \tilde{\boldsymbol{\Sigma}}\right)
\]
UKCP18 and simulated temperatures again

Multivariate Gaussian other-scenario simulations
- The average over multiple grid cells in a region \(R_j\) can be written as \[
\bar Y_t(R_j) = \dfrac{1}{n_{R_j}} \sum_{s \in R_j} Y_{t, i}
\] where \(n_{R_j}\) is the number of cells in region \(R_j\), which can also be written as \[
\bar Y_t(R_j) = \mathbf{r}_j^{\textsf{T}} \mathbf{Y}_t
\] for an appropriately formed vector \(\mathbf{r}_j\).
Multivariate Gaussian other-scenario simulations continued
- Because of the MVN assumption, we know the distribution of \(\bar Y_t(R_j)\) \[
\bar Y_t(R_j) = \mathbf{r}_j^{\textsf{T}} \mathbf{Y}_t \sim N(\mathbf{r}_j^{\textsf{T}} \boldsymbol{\mu}, \mathbf{r}_j^{\textsf{T}} \boldsymbol{\Sigma} \mathbf{r}_j)
\]
- And we also know that \[
\mathbf{Y}_t \mid \bar Y_t(R_j) = z \sim MVN_n\left(\boldsymbol{\mu}^* , \boldsymbol{\Sigma}^*\right)
\] with closed-form expressions for \(\boldsymbol{\mu}^*\) and \(\boldsymbol{\Sigma}^*\).
- We can extend this to \(p\) constraints with \(p \times n\) matrix \(\mathbf{A}\) so that \[
\mathbf{Y}_t \mid \mathbf{A} \mathbf{Y}_t = \mathbf{z} \sim MVN_p(\ldots, \ldots)
\]
Some other-scenario temperature data

Projections at Exeter

Wind components \(u\) and \(v\)
- A Gaussian model seems okay for \(u\) and \(v\)
- But we if we treat them as independent, resulting wind speeds will be nonsensical

Wind components \(u\) and \(v\)
- Simulated \(u\) and \(v\) pairs

Wind components \(u\) and \(v\)
- When conditioning on lo-res \(u\) and \(v\) pairs, it’s not clear whether to average \(u\)s and \(v\)s
- it might be more sensible to average the wind speed
- but this has to be approximated numerically
Non-Gaussian data
- Now suppose that \(Y_{t, i} \sim F_{i}\)
- marginal cumulative distribution function (CDF) \(F_{i}\) has empirical estimate \(\hat F_{i}\)
- and corresponding inverse \(\hat F_{i}^{-1}\)
- Then \(\Phi^{-1}(F_{ij}(Y_{t, i})) \sim N(0, 1)\), where \(\Phi\) denotes the standard Gaussian CDF, - We proceed modelling \[
\hat Z_{t, i} = \Phi^{-1}(\hat F_{i}(Y_{t, i})) \sim MVN_n(\boldsymbol{\mu}, \boldsymbol{\Sigma})
\]
- Then we can simulate \(Z_{t, i}\) from \(MVN_n(\hat{\boldsymbol{\mu}}, \hat{\boldsymbol{\Sigma}})\) and then obtain \(\hat Y_{t, i} = F_{i}^{-1}(\Phi(Z_{t, i}))\).
Non-Gaussian data
- Non-Gaussian margins remove the property that \(\bar Y_t\) is also Gaussian
- However, \(\bar Z_t = n^{-1} \mathbf{1}_n^\text{T} \boldsymbol{Z}_t\) is Gaussian
- We adopt the approach of establishing an empirical relationship between \(\bar Z_t\) and \(\bar Y_t\) by assuming that \(\hat{\bar Z}_t = g(\hat{\bar Y}_t) + \epsilon_i\)
Non-Gaussian data
- Let’s consider cloud cover percentage
- Here we see \(\hat{\bar z}_t\) plotted against \(\hat{\bar y}_t\), together with \(\hat g()\), a cubic spline estimate of \(g()\)

Non-Gaussian data
- Let’s consider cloud cover percentage
- Here we see \(\hat{\bar z}_t\) plotted against \(\hat{\bar y}_t\), together with \(\hat g()\), a cubic spline estimate of \(g()\)

Non-Gaussian data
- Let’s consider cloud cover percentage
- Here we see \(\hat{\bar z}_t\) plotted against \(\hat{\bar y}_t\), together with \(\hat g()\), a cubic spline estimate of \(g()\)

Non-Gaussian data
- Gaussian scale CLT simulations given total percentages of 10, 30, 50, 70, 90 and 95%.

Non-Gaussian data
- Original scale CLT simulations given total percentages of 10, 30, 50, 70, 90 and 95%.

Even more variables
- We don’t just need to model \(u\) and \(v\) as dependent
- we should assume all the variables going into
mesoclim and then JULES aren’t independent

Seasonal variation
- Variables change from month-to-month
- not just means, but also covariances

Summary
- Generating UKCP18-like data for other SSPs requires a few considerations
- Dependencies between data must be preserved
- if variables are treated as independent, this will have consequences for downscaling and subsequent estimates of tree growth
- Gaussian models are mathematically convenient, but transformations are needed for them to sensibly represent some variables, such as cloud cover
- Conditioning on low-res other-SSP data, seems to give sensible projections into the future
- SSTs still need to be considered
- and probably need to be consistent with the UKCP18 variables